Search CORE

Expansion of tandem repeats in sea anemone Nematostella vectensis proteome: A source for gene novelty?

Author: Fromer Menachem
Linial Michal
Naamati Guy
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The complete proteome of the starlet sea anemone, <it>Nematostella vectensis</it>, provides insights into gene invention dating back to the Cnidarian-Bilaterian ancestor. With the addition of the complete proteomes of <it>Hydra magnipapillata </it>and <it>Monosiga brevicollis</it>, the investigation of proteins having unique features in early metazoan life has become practical. We focused on the properties and the evolutionary trends of tandem repeat (TR) sequences in Cnidaria proteomes. Results We found that 11-16% of <it>N. vectensis </it>proteins contain tandem repeats. Most TRs cover 150 amino acid segments that are comprised of basic units of 5-20 amino acids. In total, the <it>N. Vectensis </it>proteome has about 3300 unique TR-units, but only a small fraction of them are shared with <it>H. magnipapillata, M. brevicollis</it>, or mammalian proteomes. The overall abundance of these TRs stands out relative to that of 14 proteomes representing the diversity among eukaryotes and within the metazoan world. TR-units are characterized by a unique composition of amino acids, with cysteine and histidine being over-represented. Structurally, most TR-segments are associated with coiled and disordered regions. Interestingly, 80% of the TR-segments can be read in more than one open reading frame. For over 100 of them, translation of the alternative frames would result in long proteins. Most domain families that are characterized as repeats in eukaryotes are found in the TR-proteomes from Nematostella and Hydra. Conclusions While most TR-proteins have originated from prediction tools and are still awaiting experimental validations, supportive evidence exists for hundreds of TR-units in Nematostella. The existence of TR-proteins in early metazoan life may have served as a robust mode for novel genes with previously overlooked structural and functional characteristics.</p

Springer - Publisher Connector

Codon usage is associated with the evolutionary age of genes in metazoan genomes

Author: Fromer Menachem
Linial Michal
Linial Nathan
Prat Yosef
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Codon usage may vary significantly between different organisms and between genes within the same organism. Several evolutionary processes have been postulated to be the predominant determinants of codon usage: selection, mutation, and genetic drift. However, the relative contribution of each of these factors in different species remains debatable. The availability of complete genomes for tens of multicellular organisms provides an opportunity to inspect the relationship between codon usage and the evolutionary age of genes. Results We assign an evolutionary age to a gene based on the relative positions of its identified homologues in a standard phylogenetic tree. This yields a classification of all genes in a genome to several evolutionary age classes. The present study starts from the observation that each age class of genes has a unique codon usage and proceeds to provide a quantitative analysis of the codon usage in these classes. This observation is made for the genomes of <it>Homo sapiens</it>, <it>Mus musculus</it>, and <it>Drosophila melanogaster</it>. It is even more remarkable that the differences between codon usages in different age groups exhibit similar and consistent behavior in various organisms. While we find that GC content and gene length are also associated with the evolutionary age of genes, they can provide only a partial explanation for the observed codon usage. Conclusion While factors such as GC content, mutational bias, and selection shape the codon usage in a genome, the evolutionary history of an organism over hundreds of millions of years is an overlooked property that is strongly linked to GC content, protein length, and, even more significantly, to the codon usage of metazoan genomes.</p

Springer - Publisher Connector

Springer - Publisher Connector

A functional hierarchical organization of the protein sequence space

Author: Friedlich Moriah
Fromer Menachem
Kaplan Noam
Linial Michal
Publication venue: BioMed Central
Publication date: 14/12/2004
Field of study

BACKGROUND: It is a major challenge of computational biology to provide a comprehensive functional classification of all known proteins. Most existing methods seek recurrent patterns in known proteins based on manually-validated alignments of known protein families. Such methods can achieve high sensitivity, but are limited by the necessary manual labor. This makes our current view of the protein world incomplete and biased. This paper concerns ProtoNet, a automatic unsupervised global clustering system that generates a hierarchical tree of over 1,000,000 proteins, based solely on sequence similarity. RESULTS: In this paper we show that ProtoNet correctly captures functional and structural aspects of the protein world. Furthermore, a novel feature is an automatic procedure that reduces the tree to 12% its original size. This procedure utilizes only parameters intrinsic to the clustering process. Despite the substantial reduction in size, the system's predictive power concerning biological functions is hardly affected. We then carry out an automatic comparison with existing functional protein annotations. Consequently, 78% of the clusters in the compressed tree (5,300 clusters) get assigned a biological function with a high confidence. The clustering and compression processes are unsupervised, and robust. CONCLUSIONS: We present an automatically generated unbiased method that provides a hierarchical classification of all currently known proteins

Viral adaptation to host: a proteome-based analysis of codon usage and amino acid preferences

Author: Crill WD
Iris Bahir
Melnick JL
Menachem Fromer
Michal Linial
Truyen U
Yosef Prat
Publication venue: Nature Publishing Group
Publication date
Field of study

Viruses differ markedly in their specificity toward host organisms. Here, we test the level of general sequence adaptation that viruses display toward their hosts. We compiled a representative data set of viruses that infect hosts ranging from bacteria to humans. We consider their respective amino acid and codon usages and compare them among the viruses and their hosts. We show that bacteria-infecting viruses are strongly adapted to their specific hosts, but that they differ from other unrelated bacterial hosts. Viruses that infect humans, but not those that infect other mammals or aves, show a strong resemblance to most mammalian and avian hosts, in terms of both amino acid and codon preferences. In groups of viruses that infect humans or other mammals, the highest observed level of adaptation of viral proteins to host codon usages is for those proteins that appear abundantly in the virion. In contrast, proteins that are known to participate in host-specific recognition do not necessarily adapt to their respective hosts. The implication for the potential of viral infectivity is discussed

CWI's Institutional Repository

A framework for the detection of de novo mutations in family-based sequencing data

Author: Bakker P.I.W. (Paul) de
Banks E. (Eric)
Cretu-Stancu M. (Mircea)
Daly M.J. (Mark)
DePristo M.A. (Mark)
Francioli L.C. (Laurent)
Fromer M. (Menachem)
Garimella K.V. (Kiran)
Genome of the Netherlands Consortium
Kloosterman W.P. (Wigard)
Neale B. (Benjamin)
Samocha K. (Kaitlin)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2017
Field of study

Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate. Candidate de novo mutations (DNMs) are reported along with their posterior probability, providing a systematic way to prioritize them for validation. Our tool is integrated in the Genome Analysis Toolkit and can be used together with the ReadBackedPhasing module to infer the parental origin of DNMs based on phase-informative reads. Using simulated data, we show that PBT outperforms existing tools, especially in low coverage data and on the X chromosome. We further show that PBT displays high validation rates on empirical parent-offspring sequencing data for whole-exome data from 104 trios and X-chromosome data from 249 parent-offspring families. Finally, we demonstrate an association between father's age at conception and the number of DNMs in female offspring's X chromosome, consistent with previous literature reports

Clonal Hematopoiesis and Blood-Cancer Risk Inferred from Blood DNA Sequence

Author: Bakhoum Samuel F.
Chambert Kimberly
Fromer Menachem
Gabriel Stacey B.
Genovese Giulio
Gronberg Henrik
Handsaker Robert E.
Hoglund Martin
Hultman Christina M.
Kahler Anna K.
Landen Mikael
Lander Eric S.
Lehmann Soren
Lindberg Johan
McCarroll Steven A.
Mick Eran
Moran Jennifer L.
Neale Benjamin M.
Purcell Shaun M.
Rose Samuel A.
Sklar Pamela
Sullivan Patrick F.
Svantesson Oscar
Publication venue: 'Massachusetts Medical Society'
Publication date: 01/01/2014
Field of study

Background Cancers arise from multiple acquired mutations, which presumably occur over many years. Early stages in cancer development might be present years before cancers become clinically apparent. Methods We analyzed data from whole-exome sequencing of DNA in peripheral-blood cells from 12,380 persons, unselected for cancer or hematologic phenotypes. We identified somatic mutations on the basis of unusual allelic fractions. We used data from Swedish national patient registers to follow health outcomes for 2 to 7 years after DNA sampling. Results Clonal hematopoiesis with somatic mutations was observed in 10% of persons older than 65 years of age but in only 1% of those younger than 50 years of age. Detectable clonal expansions most frequently involved somatic mutations in three genes (DNMT3A, ASXL1, and TET2) that have previously been implicated in hematologic cancers. Clonal hematopoiesis was a strong risk factor for subsequent hematologic cancer (hazard ratio, 12.9; 95% confidence interval, 5.8 to 28.7). Approximately 42% of hematologic cancers in this cohort arose in persons who had clonality at the time of DNA sampling, more than 6 months before a first diagnosis of cancer. Analysis of bone marrow–biopsy specimens obtained from two patients at the time of diagnosis of acute myeloid leukemia revealed that their cancers arose from the earlier clones. Conclusions Clonal hematopoiesis with somatic mutations is readily detected by means of DNA sequencing, is increasingly common as people age, and is associated with increased risks of hematologic cancer and death. A subset of the genes that are mutated in patients with myeloid cancers is frequently mutated in apparently healthy persons; these mutations may represent characteristic early events in the development of hematologic cancers. (Funded by the National Human Genome Research Institute and others.)National Human Genome Research Institute (U.S.) (Grant U54 HG003067)National Human Genome Research Institute (U.S.) (Grant R01 HG006855)Stanley Center for Psychiatric ResearchAlexander and Margaret Stewart TrustNational Institute of Mental Health (U.S.) (Grant R01 MH 077139)National Institute of Mental Health (U.S.) (Grant RC2 MH089905)Sylvan C. Herman Foundatio

DSpace@MIT

Carolina Digital Repository

Tradeoff Between Stability and Multispecificity in the Design of Promiscuous Proteins

Author: A Barabasi
A del Sol
A Houdusse
B Kuhlman
BI Dahiyat
BM Beadle
C Dodge
C Yanover
CJ Tsai
CM Kraemer-Pecore
CM Summa
CT Saunders
CY Chen
D Chin
D Reichmann
DB Gordon
DD Boehr
DN Bolon
E Beitz
E Yosef
EL Humphris
EL Humphris
F Ding
G Grigoryan
GD Friedland
GE Crooks
Gx Xie
H Jeong
IN Berezovsky
J Gsponer
J Karanicolas
J Mason
JDJ Han
JE Donald
JJ Havranek
JM Shifman
Julia M. Shifman
L Li
Leonid A. Mirny
M Fromer
M Fromer
M Fromer
M Ikura
M Ikura
M Schneider
M Shimaoka
M Zhang
MA Schumacher
MA Schumacher
Menachem Fromer
N Tokuriki
NA Rosenberg
O Keskin
O Keskin
O Sharabi
P Carbonell
P Pagel
RL Dunbrack
S Kirkpatrick
S Kumar
S Sankararaman
SH Gellman
T Kortemme
U Alon
V Potapov
W Meador
WL Delano
X Fu
X Hu
Z Hu
Publication venue: Public Library of Science
Publication date: 01/12/2009
Field of study

Natural proteins often partake in several highly specific protein-protein interactions. They are thus subject to multiple opposing forces during evolutionary selection. To be functional, such multispecific proteins need to be stable in complex with each interaction partner, and, at the same time, to maintain affinity toward all partners. How is this multispecificity acquired through natural evolution? To answer this compelling question, we study a prototypical multispecific protein, calmodulin (CaM), which has evolved to interact with hundreds of target proteins. Starting from high-resolution structures of sixteen CaM-target complexes, we employ state-of-the-art computational methods to predict a hundred CaM sequences best suited for interaction with each individual CaM target. Then, we design CaM sequences most compatible with each possible combination of two, three, and all sixteen targets simultaneously, producing almost 70,000 low energy CaM sequences. By comparing these sequences and their energies, we gain insight into how nature has managed to find the compromise between the need for favorable interaction energies and the need for multispecificity. We observe that designing for more partners simultaneously yields CaM sequences that better match natural sequence profiles, thus emphasizing the importance of such strategies in nature. Furthermore, we show that the CaM binding interface can be nicely partitioned into positions that are critical for the affinity of all CaM-target complexes and those that are molded to provide interaction specificity. We reveal several basic categories of sequence-level tradeoffs that enable the compromise necessary for the promiscuity of this protein. We also thoroughly quantify the tradeoff between interaction energetics and multispecificity and find that facilitating seemingly competing interactions requires only a small deviation from optimal energies. We conclude that multispecific proteins have been subjected to a rigorous optimization process that has fine-tuned their sequences for interactions with a precise set of targets, thus conferring their multiple cellular functions

Arc requires PSD95 for assembly into postsynaptic complexes involved with neural dysfunction and intelligence

Author: Fernández Esperanza
Collins Mark O.
Frank René A.W.
Zhu Fei
Kopanitsa Maksym V.
Nithianantharajah Jess
Lemprière Sarah A.
Fricker David
Elsegood Kathryn A.
McLaughlin Catherine L.
Croning Mike D.R.
Mclean Colin
Armstrong J. Douglas
Hill W. David
Deary Ian J.
Cencelli Giulia
Bagni Claudia
Fromer Menachem
Purcell Shaun M.
Pocklington Andrew J.
Choudhary Jyoti S.
Komiyama Noboru H.
Grant Seth G.N.
Publication venue: Elsevier
Publication date: 01/01/2017
Field of study

Arc is an activity-regulated neuronal protein, but little is known about its interactions, assembly into multiprotein complexes, and role in human disease and cognition. We applied an integrated proteomic and genetic strategy by targeting a tandem affinity purification (TAP) tag and Venus fluorescent protein into the endogenous Arc gene in mice. This allowed biochemical and proteomic characterization of native complexes in wild-type and knockout mice. We identified many Arc-interacting proteins, of which PSD95 was the most abundant. PSD95 was essential for Arc assembly into 1.5-MDa complexes and activity-dependent recruitment to excitatory synapses. Integrating human genetic data with proteomic data showed that Arc-PSD95 complexes are enriched in schizophrenia, intellectual disability, autism, and epilepsy mutations and normal variants in intelligence. We propose that Arc-PSD95 postsynaptic complexes potentially affect human cognitive function

Online Research @ Cardiff

Edinburgh Research Explorer

Spiral - Imperial College Digital Repository

ART

White Rose Research Online

University of Melbourne Institutional Repository

FigShare

A framework for the detection of de novo mutations in family-based sequencing data

Author: A Hodgkinson
A Kong
A McKenna
A Ramu
Benjamin M Neale
BM Neale
CA Brownstein
D Earl
DF Conrad
ED Gamsiz
Eric Banks
Genome of the Netherlands Consortium
H Li
H Li
JA Veltman
JJ Michaelson
Kaitlin E Samocha
Kiran V Garimella
Laurent C Francioli
LC Francioli
MA DePristo
Mark A DePristo
Mark J Daly
Menachem Fromer
Mircea Cretu-Stancu
MW Nachman
Paul IW de Bakker
Q Wei
The 1000 Genomes Consortium
Wigard P Kloosterman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Francioli LC, Cretu-Stancu M, Garimella KV, et al. A framework for the detection of de novo mutations in family-based sequencing data. European Journal of Human Genetics. 2016;25(2):227-233